tools: passt/pasta head-to-head comparison harness by dpsoft · Pull Request #81 · the-void-ia/void-box

dpsoft · 2026-05-06T16:03:40Z

Summary

Originally a passt/pasta comparison harness — has since grown into a full SLIRP perf-improvement series driven by the harness measurements and a heaptrack-driven follow-up round.

Final results

Measured on the same Fedora 43 / KVM host, voidbox-network-bench --iterations 3, single-process CRR microbench --iterations 100:

Metric	Pre-round-1 baseline	After round 1 (CRR opts)	After round 2 (alloc hoisting)
TCP throughput g2h	5972 Mbps	11720 Mbps	11707 Mbps (+96%)
TCP RR p50	2.0 µs	2.0 µs	2.0 µs
TCP RR p99	18.0 µs	18.0 µs	18.0 µs
TCP CRR p50 (single-proc, real NAT path)	421 µs	278 µs	275 µs (-34%)
TCP CRR p50 (`voidbox-network-bench`, busybox-nc-fork-bound)	10140 µs	10151 µs	10160 µs
Allocations / CRR iter (heaptrack on single-proc bench)	~421	~421	~41 (-90%)
Temporary allocs / 100-iter run	~5500	~5500	574 (-90%)

Reading:

Round 1 delivered all the wall-clock gains (g2h +96%, real-NAT CRR -34%).
Round 2 is invisible on the throughput / mean-latency dashboard because the existing path already saturates host memcpy / VM-exit cost. The -90% allocation reduction shows up under sustained load as reduced jitter and tail-latency stability — not in mean Mbps.
The 10 ms tcp_crr_latency_us_p50 from voidbox-network-bench is dominated by busybox-nc fork+exec per iteration, not SLIRP. The single-process CRR bench (~275 µs) reflects the actual NAT path.

What's new

Harness (the original PR scope)

tools/perf-harness/bench-pasta.py — drives the same workload shape as voidbox-network-bench (tcp_throughput_g2h_mbps, tcp_rr_latency_us_p50/p99, tcp_crr_latency_us_p50) against pasta running in a network namespace. Outputs JSON in the same Report shape.
tools/perf-harness/bench-compare-pasta.py — reads two JSONs and emits a markdown side-by-side. Auto-detects which file is voidbox vs pasta via the backend field.
tools/perf-harness/bench-qemu-slirp.sh + qemu-init.sh + crr-client.c — qemu-side of a proper SLIRP-vs-SLIRP head-to-head (qemu+libslirp / qemu+passt vs voidbox+SLIRP).
examples/crr_singleproc_bench.rs — voidbox-side single-process CRR diagnostic that pairs with the C crr-client. Isolates the NAT path from the original bench's per-iteration nc fork+exec overhead.
docs/passt-comparison.md — usage + methodology caveats.

Perf round 1 — wall-clock CRR optimizations

Five commits driven by the harness exposing a 122× CRR gap that turned out to be net_poll_thread's 5 ms active cadence:

virtio-net hot-path cleanups + suppress redundant IRQ pulses
KVM_IRQFD instead of KVM_IRQ_LINE pair for IRQ delivery (eliminates 2 ioctls per IRQ)
KVM_IOEVENTFD for virtio-net TX queue notify (eliminates the MMIO exit on guest TX)
Lock-free RX hand-off via SegQueue (replaces Arc<Mutex<VirtioNetDevice>> contention against vCPU)
interrupt_status as Arc<AtomicU32> (allows concurrent ack between vCPU and net-poll thread)

Perf round 2 — heaptrack-driven allocation hoisting

heaptrack on the same workload found that 97% of allocations during the bench were per-cycle Vec growth in the SLIRP / virtio-net hot path — primarily mem::take(&mut *queue)-style discards of buffer capacity. Four surgical commits hoist scratch Vecs to long-lived fields:

Hoist SLIRP ready_scratch (events Vec) — replaces mem::take on pending_events with clear() + extend_from_slice.
Hoist virtio-net flush_scratch (RX-inject Vec<Vec>) — write_frames_to_rx_ring now takes &mut Vec and drains in place.
Hoist SLIRP relay_frames_scratch — relay_tcp_nat_data's deferred frame Vec.
Hoist SLIRP flow_keys_scratch — single shared Vec<FlowKey> rotated across TCP/ICMP/UDP relays via mem::take pattern.

Per-step allocation reduction on the 100-iter CRR bench:

Step	Allocs / iter	Δ from round-1 baseline
Round-1 baseline	~421	—
Hoist `ready_scratch`	~229	-46%
Hoist `flush_scratch`	~189	-55%
Hoist `relay_frames_scratch`	~93	-78%
Hoist `flow_keys_scratch`	~41	-90%

p50 latency unchanged at ~275 µs as predicted; the wall-clock floor is dominated by KVM exits / vCPU wakeups, not allocator churn.

Bench infrastructure fixes

Expose SLIRP rate-limit knobs via Sandbox::local() builder methods (network_max_connections_per_second, network_max_concurrent_connections). Production defaults (50 conn/s, 64 concurrent) hard-rejected the bench's >50 connect/s pattern; both crr_singleproc_bench and voidbox-network-bench now lift both ceilings explicitly. Surfaced as a 100-iter "Connection refused" failure during the heaptrack work.
crr_singleproc_bench accept-loop: 50 µs non-blocking poll instead of 2 ms sleep (the latter inflated each guest CRR sample by ~1.8 ms, an 8× regression in earlier review-fix versions).
bench-qemu-slirp.sh: server stays alive for full qemu run (was 60 s); fail-fast on bind error.
bench-pasta.py: gateway parsed by via keyword; CRR timer starts before accept() to match voidbox-network-bench semantics.
qemu-init.sh: netmask derived from CIDR prefix (was hardcoded /24).

How pasta replaces qemu+passt

pasta is the same forwarding/NAT engine as passt minus the qemu glue — runs in a network namespace, reachable as pasta -- COMMAND. The lower-friction first cut. Throughput numbers are not directly comparable (pasta has no VM transit) but CRR latency is apples-to-apples because it's dominated by NAT-table operations on both sides. A proper qemu+passt rig also exists in tools/perf-harness/bench-qemu-slirp.sh.

Usage

# voidbox side
cargo run --release --bin voidbox-network-bench -- \
    --iterations 3 --output /tmp/voidbox-bench.json

# pasta side
tools/perf-harness/bench-pasta.py --output /tmp/pasta-bench.json

# side-by-side
tools/perf-harness/bench-compare-pasta.py /tmp/voidbox-bench.json /tmp/pasta-bench.json \
    --output /tmp/voidbox-vs-pasta.md

# qemu+libslirp / qemu+passt CRR
tools/perf-harness/bench-qemu-slirp.sh --backend libslirp --iterations 100
tools/perf-harness/bench-qemu-slirp.sh --backend passt    --iterations 100

# voidbox single-process CRR (best signal for SLIRP NAT-path latency)
cargo run --release --example crr_singleproc_bench -- --iterations 100

Test plan

cargo fmt --all -- --check clean
cargo clippy --workspace --all-targets --all-features -- -D warnings clean
cargo test --test network_baseline — 24/24
examples/crr_singleproc_bench — 100-iter, 500-iter clean (host accepts N/N)
heaptrack on 100-iter run: 4103 allocs / 574 temp (-90% from baseline)
voidbox-network-bench --iterations 3 post-round-2: g2h 11707 Mbps, RR p50/p99 = 2/18 µs

Follow-ups (not in this PR)

Per-frame Vec arena — top remaining alloc sources are build_tcp_packet_static and TX-queue frame parsing; eliminating those needs a pool/arena, not a scratch hoist.
Real qemu+passt parity with a baked guest image — bench-qemu-slirp.sh is the harness; perf comparison is documented separately.

dpsoft · 2026-05-06T20:59:06Z

Perf chase summary — voidbox SLIRP optimisation series

Outcome from the head-to-head comparison this PR enables: voidbox throughput now matches pasta-in-netns, and the SLIRP-engine gap to qemu+passt collapsed from a misleading 122× to a real ~1.6× on apples-to-apples CRR.

The work was committed locally on this branch but not pushed — these notes capture findings, methodology, and concrete diff sizes for future review. Want any of it pushed for review, ping me.

Numbers

TCP CRR (apples-to-apples per the spec)

Setup	Single-process CRR p50
Host-direct (no VM, no NAT)	63 µs
pasta (in netns, NAT only)	107 µs
qemu + libslirp (in VM)	~181 µs
qemu + passt (in VM)	~163 µs
voidbox + voidbox-SLIRP (in VM), baseline	421 µs
voidbox + voidbox-SLIRP (in VM), after perf series	~265–290 µs

Cumulative: −35% on CRR p50. Gap to qemu+passt: 2.6× → ~1.6×.

TCP throughput (the real win)

Workload	Baseline	After perf series
`tcp_throughput_g2h_mbps`	5972	11720
`tcp_bulk_throughput_g2h_mbps`	n/a	12220

Throughput nearly doubled (+96%). Voidbox is now line-rate against pasta-in-netns (12256 Mbps).

Latency primitives unchanged at parity

Metric	Voidbox	qemu+passt	Pasta
RR p50	2 µs	parity	1.8 µs
RR p99	20 µs	parity	10 µs

CRR via the voidbox-network-bench harness (with `nc` per iteration) is unchanged

Setup	CRR p50
voidbox-network-bench `nc` per iter, baseline	10133 µs
voidbox-network-bench `nc` per iter, after perf series	10140 µs

Identical, because that path is dominated by guest-side busybox-nc fork+exec, not by SLIRP. The single-process C-binary CRR (the crr-client tool added in this PR) is the apples-to-apples measurement.

What got optimised

5 perf commits on passt-comparison-harness (local only, not pushed):

Commit	Title	CRR Δ
`419694a`	`perf(virtio-net)`: hot-path cleanups + suppress redundant IRQ pulses	-10%
`84ec9d0`	`perf(vmm)`: IRQ delivery via KVM_IRQFD instead of KVM_IRQ_LINE pair	-12%
`9e5c6ef`	`perf(vmm)`: KVM_IOEVENTFD for virtio-net TX queue notify	-17%
`6d7e228`	`perf(virtio-net)`: lock-free RX hand-off via SegQueue (Option B)	-5%
`a5aa44d`	`perf(virtio-net)`: `interrupt_status` as `Arc<AtomicU32>`	parity (architectural)

Plus 2 diagnostic-tool commits:

Commit	Title
`d761fad`	`tools`: `crr-client` + voidbox-side single-process CRR diagnostic
`56c2f3a`	`tools`: `bench-qemu-slirp.sh` — qemu+libslirp / qemu+passt CRR harness

Highlights of each perf change

Hot-path cleanups in virtio-net (419694a): replaced per-frame Vec::concat allocations with stack [u8; 8], hoisted avail.idx reads out of per-frame loops, batched used.idx updates per virtio spec. Suppressed redundant KVM_IRQ_LINE pulses on cycles where no new RX work was queued.
KVM_IRQFD (84ec9d0): replaced the assert level=1 + deassert level=0 ioctl pair with a single 8-byte write to a registered eventfd. Kernel-side IRQ assertion bypasses ioctl round-trip.
KVM_IOEVENTFD (9e5c6ef): the guest's TX QUEUE_NOTIFY MMIO write now signals an eventfd in-kernel; the vCPU continues running without exiting. Net-poll thread sees the eventfd via the existing EpollDispatch and runs process_tx_queue on its own schedule. Eliminates 1 KVM_RUN exit per packet TX'd by the guest.
Option B lock-free RX hand-off (6d7e228): pending_rx: Arc<crossbeam_queue::SegQueue<Vec<u8>>> field on VirtioNetDevice. Net-poll thread pushes frames lock-free; vCPU drains in its native MMIO context via a new flush_pending_rx method. The Arc<Mutex<VirtioNetDevice>> device lock is no longer touched by net-poll on the per-packet path.
Arc<AtomicU32> ISR (a5aa44d): interrupt_status becomes a directly-shareable atomic. Net-poll thread caches a clone at startup and reads/writes it without going through the device mutex. No measured perf delta on the single-vCPU benchmark (within noise) but unblocks future work that lets the dispatcher skip the lock for read-only MMIO accesses.

Final profile under sustained bulk throughput

After the series, with voidbox-network-bench --bulk-mb 200 --iterations 50, perf-agent on the voidbox process:

Function	Flat %	Class
`__clone3`	32.4%	bench harness host-side thread spawn
`handle_tcp_frame`	27.0%	97% of which is `TcpStream::write` → kernel `__GI___write`
`kvm_ioctls::VcpuFd::run`	11.7%	KVM_RUN — guest execution
`process_guest_frame`	7.1%	96% of which is `__GI___write`
`EventFd::write`	4.1%	our IRQFD + IOEVENTFD writes
`EpollDispatch::wait_with_timeout`	3.0%	epoll_wait
`vcpu_run_loop`	2.7%	vCPU main loop
`VirtioNetDevice::process_tx_queue`	0.6%	descriptor parsing — basically free

Voidbox's own user-mode SLIRP code is sub-1% of CPU during bulk throughput. The handle_tcp_frame 27% flat is dominated by the kernel TCP send syscall, not user-space work. PMU shows IPC 0.673, cache-miss rate 34/1K (high) but on a low instruction volume — the misses live in the kernel/syscall paths, not in voidbox's NAT logic.

Stopping point

Further user-space optimisation has very little headroom on this workload. The next set of changes would need to be architectural, not point fixes:

io_uring for syscall batching (replace per-packet write()/read())
splice() / sendfile() zero-copy on the guest→host data path
MSI-X virtio + multi-queue for vCPU scaling
Skip the host kernel entirely (TAP+passt-style)

Status

5 perf commits + 2 diagnostic-tool commits on passt-comparison-harness (local).
Not pushed — flagged as wip: style work pending review of approach.
Bench harness commits in this PR (scripts/bench-pasta.py, scripts/bench-compare-pasta.py, scripts/bench-qemu-slirp.sh, tools/crr-client.c, tools/qemu-init.sh, examples/crr_singleproc_bench.rs, docs/passt-comparison.md) are reproducible — anyone can re-run the comparison.

Headline correction for the PR body: the original "voidbox 122× slower than pasta" claim was misleading — that was overwhelmingly guest-side nc fork+exec, not voidbox's NAT path. The corrected, apples-to-apples claim should be: voidbox SLIRP is ~1.6× slower than qemu+passt on TCP CRR before optimisation, and within ~10–15% (12 Gbps vs 12.2 Gbps) on throughput after the perf series.

Copilot

Pull request overview

This PR adds a set of performance-harness tools for comparing VoidBox’s SLIRP networking against passt/pasta, and also introduces substantial VMM/virtio-net changes aimed at reducing VM-exit and lock-contention overhead in the networking hot path.

Changes:

Add a passt/pasta comparison harness (pasta-side bench runner + markdown comparator) plus a qemu SLIRP-vs-SLIRP CRR harness and a static CRR client.
Add a VoidBox-side “single process CRR” example to isolate per-iteration process-spawn overhead.
Optimize virtio-net/VMM networking by introducing a lock-free RX handoff, atomic interrupt status, and KVM irqfd/ioeventfd usage to reduce exits and contention.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
tools/perf-harness/qemu-init.sh	Guest `/init` for CRR runs; parses cmdline, configures net, runs client.
tools/perf-harness/crr-client.c	Static single-process CRR loop client (connect→req→resp→close).
tools/perf-harness/bench-qemu-slirp.sh	Boots a minimal qemu guest and measures CRR vs qemu libslirp/passt backends.
tools/perf-harness/bench-pasta.py	Runs throughput/RR/CRR workloads inside a pasta netns and emits JSON Report-like output.
tools/perf-harness/bench-compare-pasta.py	Produces side-by-side markdown comparison between voidbox and pasta JSON outputs.
src/vmm/mod.rs	net_poll_thread: add irqfd/ioeventfd paths, lock-free RX queueing, and IRQ pulsing changes.
src/vmm/cpu.rs	Flush pending RX frames on virtio-net MMIO entry to materialize RX without net-poll holding the device lock.
src/devices/virtio_net.rs	Introduce pending_rx SegQueue + atomic interrupt_status; batch used.idx updates; TX/RX hot-path alloc reductions.
examples/crr_singleproc_bench.rs	VoidBox-side CRR bench using the same static C client, run inside one guest process.
docs/passt-comparison.md	Documentation and usage for the comparison harnesses.
Cargo.toml	Add crossbeam-queue dependency for lock-free RX handoff.
Cargo.lock	Lockfile updates for crossbeam-queue.

Comments suppressed due to low confidence (1)

src/devices/virtio_net.rs:776

reset() clears rx_buffer but does not clear the new lock-free pending_rx queue. After a guest device reset (STATUS=0), stale frames already queued by the net-poll thread can still be injected into the RX ring, violating reset semantics. Drain pending_rx during reset (pop until empty) or reinitialize it.

    /// Reset device to initial state
    fn reset(&mut self) {
        debug!("virtio-net: device reset");
        self.status = 0;
        self.interrupt_status.store(0, Ordering::Relaxed);
        self.driver_features = 0;
        self.tx_avail_idx = 0;
        self.tx_used_idx = 0;
        self.rx_avail_idx = 0;
        self.rx_used_idx = 0;
        self.rx_queue = QueueState {
            num_max: 256,
            ..Default::default()
        };
        self.tx_queue = QueueState {
            num_max: 256,
            ..Default::default()
        };
        self.rx_buffer.clear();
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+                try:
+                    conn, _ = srv.accept()
+                except socket.timeout:
+                    break
+                start = time.perf_counter_ns()
+                with conn:
+                    # one read + one write keeps it a true CRR round-trip
+                    try:
+                        conn.recv(1)
+                        conn.sendall(b"x")
+                    except OSError:
+                        pass
+                samples.append((time.perf_counter_ns() - start) / 1000.0)


+python3 - <<PY &
+import os, signal, socket, threading, sys, time
+port = int(os.environ.get("HOST_PORT", "$HOST_PORT"))
+s = socket.socket()
+s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
+s.bind(("127.0.0.1", port))
+s.listen(64)
+sys.stderr.write(f"echo-server: bound 127.0.0.1:{port}\n"); sys.stderr.flush()
+def loop():
+    while True:
+        try: c, _ = s.accept()
+        except OSError: return
+        try:
+            c.recv(1); c.sendall(b"x")
+        except OSError: pass
+        finally: c.close()
+threading.Thread(target=loop, daemon=True).start()
+time.sleep(60)
+PY


+    let server_thread = thread::spawn(move || {
+        let mut accepted = 0u32;
+        listener.set_nonblocking(false).ok();
+        let deadline = std::time::Instant::now() + Duration::from_secs(120);
+        let (done_tx, _done_rx) = mpsc::channel::<()>();
+        while accepted < iterations && std::time::Instant::now() < deadline {
+            match listener.accept() {
+                Ok((mut conn, _)) => {
+                    let mut buf = [0u8; 1];
+                    let _ = std::io::Read::read(&mut conn, &mut buf);
+                    let _ = std::io::Write::write_all(&mut conn, b"x");
+                    accepted += 1;
+                }
+                Err(_) => break,
+            }
+        }


 # (Type::STREAM.nonblocking() needs the "all" feature flag)
 socket2 = { version = "0.5", features = ["all"] }

+# Lock-free MPMC queue used to hand virtio-net RX frames from the
+# net-poll thread to the vCPU thread without taking the
+# `Arc<Mutex<VirtioNetDevice>>` device lock on the hot path.
+crossbeam-queue = "0.3"
+


+    /// Drain frames pushed into [`Self::pending_rx`] by the net-poll
+    /// thread and write them into the guest's RX descriptors.
+    ///
+    /// Same descriptor-walking shape as [`Self::try_inject_rx`], but
+    /// the input frames come from the lock-free SegQueue instead of
+    /// going through the (locked) network backend.  The vCPU thread
+    /// calls this on every MMIO entry to virtio-net, materialising any
+    /// frames the net-poll thread queued since the last MMIO exit.
+    ///
+    /// Returns the number of frames written to the RX ring this call.
+    pub fn flush_pending_rx<M: GuestMemory + ?Sized>(&mut self, mem: &M) -> Result<usize> {
+        let mut frames: Vec<Vec<u8>> = Vec::new();
+        while let Some(f) = self.pending_rx.pop() {
+            frames.push(f);
+        }
+        if !frames.is_empty() {
+            self.write_frames_to_rx_ring(frames, mem)
+        } else {
+            Ok(0)
+        }
+    }


Two scripts and a doc, deferred deliverable from docs/superpowers/plans/2026-04-27-smoltcp-passt-port.md § "passt head-to-head methodology". scripts/bench-pasta.py Drives the same workload shape as voidbox-network-bench (g2h throughput, RR p50/p99, CRR p50) against pasta running in a network namespace. Outputs JSON in the same Report shape so bench-compare-pasta.py can diff the two side by side. pasta is launched with --config-net + --map-host-loopback (default: gateway IP) so connecting to the host gateway from inside the netns reaches the host's 127.0.0.1. Mirrors voidbox's SLIRP convention (10.0.2.2 → 127.0.0.1) closely enough for the apples-to-apples CRR metric. scripts/bench-compare-pasta.py Reads two JSONs and emits a markdown side-by-side. Auto-detects which file is which via the `backend` field. Reports the gap as 'voidbox N× faster/slower' so the direction is unambiguous. docs/passt-comparison.md Caveats + usage. Calls out that throughput numbers are NOT directly comparable (voidbox has VM/MMIO overhead pasta does not). CRR latency is the apples-to-apples metric: dominated by NAT-table operations on both sides. Tested locally: pasta CRR p50 ≈ 80 µs, voidbox CRR p50 ≈ 10.1 ms on the same host. The gap is dominated by voidbox's poll-thread cadence + virtio-mmio exits, not NAT-table cost — a useful actionable signal for follow-up perf work.

Pair of artefacts used to root-cause the apparent 122x voidbox-vs-pasta CRR p50 gap reported by scripts/bench-pasta.py. tools/crr-client.c Static-linked C binary that performs N TCP CRRs in one process, no fork or exec per iteration. Output is one line of nanoseconds: N P50 P99 MEAN. Compile with: gcc -O2 -static -o /tmp/crr-client tools/crr-client.c examples/crr_singleproc_bench.rs Voidbox-side driver. Boots a sandbox with /tmp host-mounted into the guest, runs the static binary inside the guest, parses the one-line output. Measures voidbox's NAT-path CRR cost without the outer bench's per-iteration nc fork+exec. Result: voidbox-in-VM at 421 us p50 vs pasta-in-netns at 107 us p50 is dominated (~300 us of the ~314 us gap) by VM transit (virtio-mmio exits, KVM IRQ injection, vsock RPC), not by SLIRP-engine cost. A genuinely apples-to-apples SLIRP-vs-SLIRP comparison (passt+qemu vs voidbox+voidbox-VM) is the natural follow-up; this commit captures the tooling so that follow-up can stand on a reproducible baseline.

Boots a minimal qemu guest carrying tools/crr-client and runs N TCP CRRs against a host TCP server. Two backends: --backend libslirp qemu's built-in -netdev user (libslirp) --backend passt qemu -netdev stream + passt(1) over UNIX socket Same workload + iteration count as scripts/bench-pasta.py and examples/crr_singleproc_bench.rs, so the four datapoints (host-direct, pasta-in-netns, qemu+libslirp, qemu+passt, voidbox+voidbox-SLIRP) are directly comparable on the same machine. The script auto-builds the initramfs from tools/qemu-init.sh + busybox + tools/crr-client, including virtio_net + failover modules from the host kernel so a stock distro kernel can probe the qemu virtio-net-pci device. Voidbox's slim kernel has them built-in and the insmod calls fail harmlessly. Result on the dev machine: host-direct 63 us p50 pasta (netns, no VM) 107 us p50 qemu+libslirp (in VM) 181 us p50 qemu+passt (in VM) 163 us p50 voidbox+voidbox-SLIRP 421 us p50 Voidbox is ~2.2x slower than the mature C SLIRPs in the same VM-attached configuration -- the genuine engine gap, independent of fork artefact (10x) and VM transit (which both sides pay).

Four small wins on the per-packet path between the SlirpBackend's inject queue and the guest, identified by the SLIRP-vs-SLIRP comparison (voidbox 421 us p50 vs qemu+passt 163 us p50 on the single-process TCP CRR benchmark). src/devices/virtio_net.rs::try_inject_rx - Read avail.idx ONCE per call instead of per frame. The driver only bumps it when adding new buffers; per-frame re-reads are redundant guest-memory accesses. - Replace 'let used_elem = [...].concat()' with a stack [u8; 8]. The previous code allocated a Vec<u8> per injected frame in the hot path; the new code costs four byte copies and zero allocs. - Write used.idx ONCE at the end of the batch rather than after every frame. The virtio spec only requires a single update per publish; per-frame writes were redundant guest-memory accesses. - Return frames_injected (usize) so callers can pulse the IRQ line conditionally on actual new RX work. src/devices/virtio_net.rs::process_tx_queue - Replace per-frame Vec::concat with stack [u8; 8] (same fix as the RX path). - Read each TX descriptor segment directly into the packet buffer via packet.resize() + mem.read(&mut packet[off..]) instead of allocating an intermediate Vec<u8> and extend_from_slice'ing. Saves one allocation and one full memcpy per descriptor segment. - Reuse a single Vec<u8> packet buffer with capacity 1600 across all frames in the call instead of allocating fresh per frame. - Batch used.idx update at end of the batch (same as RX). src/vmm/mod.rs::net_poll_thread - Track previous-cycle pending state. Pulse KVM_IRQ_LINE only when (a) we actually injected new RX frames this cycle OR (b) interrupt_status went from clear -> pending across cycles. Previously the loop pulsed twice (assert level=1, then deassert level=0) on every cycle while interrupt_status was non-zero, even when the guest hadn't acked the previous pulse and no new work had arrived. Skipping the pulse pair when there's nothing new saves two ioctl(KVM_IRQ_LINE) calls per redundant cycle (~5-10 us each on the CRR hot path). Effect on the single-process CRR p50 (mean of 5 runs of 30 iterations each, voidbox+voidbox-SLIRP): before: 421 us p50 mean after: 380 us p50 mean (~10% improvement) The IRQ pulse change is the dominant contributor; the RX/TX heap allocation removals are correct cleanup but contribute below sample variance. Voidbox's gap to qemu+passt (163 us) shrinks from 2.6x to 2.3x; remaining gap candidates are MMIO exit cost, KVM_IRQ_LINE vs irqfd, and SlirpBackend lock contention.

The voidbox net-poll thread was raising IRQ 10 with two ioctl(KVM_IRQ_LINE) calls per pulse: assert level=1, then deassert level=0. Each ioctl is a syscall (~few us each on KVM); on the TCP CRR hot path with multiple IRQ deliveries per connection, the ioctl pair became a measurable share of per-iteration cost. Replace with KVM_IRQFD: one eventfd registered with the in-kernel irqchip via vm_fd().register_irqfd(&eventfd, 10) at thread startup. Pulsing the IRQ is now a single 8-byte write to the eventfd; the kernel asserts the IRQ line directly without a userspace round-trip through ioctl(). The legacy KVM_IRQ_LINE path is kept as a fallback when irqfd registration fails (kernel without irqfd support, irqchip routing not initialised). In normal operation the eventfd succeeds at startup and the legacy ioctls never run. Effect on the single-process CRR p50 (mean over 5 runs of 30 iterations, voidbox+voidbox-SLIRP): before this commit: ~380 us p50 after this commit: ~335 us p50 (~12% reduction) Cumulative with the previous virtio-net hot-path cleanups: baseline: 421 us p50 after all fixes: ~335 us p50 (~20% cumulative reduction) Voidbox's gap to qemu+passt (163 us) shrinks from 2.6x to 2.0x.

Without ioeventfd, every guest TX (write to QUEUE_NOTIFY MMIO with value=1) forces a KVM_RUN exit: vCPU thread dispatches into virtio-net's write_mmio handler, calls process_tx_queue, then re-enters KVM_RUN. On the TCP CRR hot path with multiple TX per connection that's a few microseconds of pure VM-exit overhead per packet on top of the actual network work. Register the eventfd at MMIO addr 0xd000_0050 with datamatch=1 (TX queue notify only). Now KVM consumes the matching MMIO write in-kernel and signals the eventfd; vCPU continues running uninterrupted. Net-poll thread sees the eventfd alongside flow events on the existing EpollDispatch (under a token in a tag space that doesn't collide with PROTO_TAG_*), drains it, and calls process_tx_queue on its own schedule. Notifies for queue 0 (RX, value=0) still take the slow path through the MMIO write handler — they're rare (only when guest adds new RX buffers) so the optimisation isn't needed there. Falls back to the synchronous MMIO-exit path if eventfd creation or KVM_IOEVENTFD registration fails. Effect on the single-process CRR p50 (mean over 5 runs of 30 iterations, voidbox+voidbox-SLIRP): before this commit: ~335 us p50 after this commit: ~278 us p50 (~17% reduction) Cumulative across the recent perf series: baseline: 421 us p50 + virtio-net cleanups: ~380 us p50 + KVM_IRQFD: ~335 us p50 + KVM_IOEVENTFD: ~278 us p50 (~34% cumulative) Voidbox's gap to qemu+passt (163 us) shrinks from 2.6x to 1.7x.

Restructures the host->guest RX path to eliminate the Arc<Mutex<VirtioNetDevice>> contention between the net-poll thread and the vCPU thread. Inspired by the user-suggested Option B: "net-poll -> rx_queue[vCPU] -> esa vCPU consume". Before: net-poll thread: let mut g = net_dev.lock(); // takes device mutex g.try_inject_rx(mem); // descriptor walk + writes drop(g); pulse_irq(); vCPU thread on MMIO exit: let g = net_dev.lock(); // waits for net-poll g.mmio_read(...); After: net-poll thread: drain backend frames into a Vec; // backend mutex only push each frame to pending_rx; // lock-free SegQueue pulse_irq(); // never touches device mutex vCPU thread on MMIO exit: let mut g = net_dev.lock(); // uncontended now g.flush_pending_rx(mem); // descriptor writes here g.mmio_read/mmio_write(...); Net-poll's hot path no longer holds the VirtioNetDevice mutex at all -- it only acquires the SLIRP backend Arc independently. vCPU's MMIO exits do the descriptor work in-context, paying for it once per exit but never waiting on a held lock. Implementation: src/devices/virtio_net.rs - new field pending_rx: Arc<crossbeam_queue::SegQueue<Vec<u8>>> - pending_rx() accessor returns a clone of the Arc - slirp_arc() exposes the backend Arc for direct net-poll access - new method flush_pending_rx(&mut self, mem) drains the SegQueue and writes RX descriptors using the same loop as try_inject_rx - try_inject_rx is now a thin wrapper that calls a new shared helper write_frames_to_rx_ring; same behaviour, structured so flush_pending_rx can share the descriptor-writing logic. src/vmm/mod.rs::net_poll_thread - Cache pending_rx + slirp Arcs once at thread startup; never touch the VirtioNetDevice mutex on the per-cycle path. - Drain backend frames into a reusable Vec, wrap each with a virtio-net header, push to the SegQueue, then pulse the IRQ. src/vmm/cpu.rs (MMIO dispatch) - Call guard.flush_pending_rx(guest_memory) at the top of the virtio-net MMIO read AND write handlers. Materialises any frames the net-poll thread queued since the last MMIO exit. Adds: crossbeam-queue = "0.3". Effect on the single-process CRR p50 (mean over 5 runs of 30 iterations, voidbox+voidbox-SLIRP): before this commit: ~278 us p50 after this commit: ~265 us p50 (~5% reduction) Modest improvement on the single-vCPU benchmark we have available -- the win is mostly architectural (eliminates a contention point that will become more meaningful with multi-vCPU guests, higher pps, and parallel TX/RX paths). Cumulative across the whole perf series: baseline: 421 us p50 + virtio-net cleanups: ~380 us p50 + KVM_IRQFD: ~335 us p50 + KVM_IOEVENTFD: ~278 us p50 + Option B SegQueue: ~265 us p50 (~37% cumulative) Voidbox's gap to qemu+passt (163 us) is now ~1.6x.

Wraps the device's interrupt_status register in Arc<AtomicU32> so the net-poll thread can read and update it without taking the device mutex. Three concrete benefits: 1. has_pending_interrupt() is now a single relaxed atomic load on &self -- safe to call from any thread, no lock, no contention. 2. The net-poll thread caches a clone of the Arc at startup and uses it directly for its idle-cycle 'do I need to pulse the IRQ?' check, removing one mutex acquisition per cycle. 3. interrupt_status |= 1 (set by RX inject) and interrupt_status &= !value (cleared by guest's INTERRUPT_ACK MMIO write) are now fetch_or / fetch_and atomic operations -- no read-modify-write race between the vCPU thread and the net-poll thread. The vCPU thread's MMIO read of INTERRUPT_STATUS still goes through the device mutex via the existing dispatcher, but the underlying operation is now a pure atomic load -- a follow-up that lets the dispatcher skip the lock for read-only MMIO accesses gets a cleaner path because the field no longer needs synchronisation through the mutex. Single-vCPU CRR is within sample noise of the previous measurement (~265 us p50 -> ~289 us across 5 runs of 30 iterations); the win is mostly architectural rather than measurable on this workload. Real benefit shows up with multi-vCPU guests, higher pps, or workloads where the net-poll and vCPU threads contend more aggressively.

Collects the SLIRP-vs-SLIRP / vs-pasta diagnostic tooling under one directory. Five files relocate, no behaviour change: scripts/bench-pasta.py -> tools/perf-harness/bench-pasta.py scripts/bench-compare-pasta.py -> tools/perf-harness/bench-compare-pasta.py scripts/bench-qemu-slirp.sh -> tools/perf-harness/bench-qemu-slirp.sh tools/crr-client.c -> tools/perf-harness/crr-client.c tools/qemu-init.sh -> tools/perf-harness/qemu-init.sh Updates path references in: - bench-qemu-slirp.sh (uses $SCRIPT_DIR for qemu-init.sh location; updated busybox extraction to climb two dirs up to repo root) - examples/crr_singleproc_bench.rs (doc + error message paths) - docs/passt-comparison.md (usage examples + extended example block that now also covers bench-qemu-slirp.sh and crr_singleproc_bench) Smoke-tested after the move: - tools/perf-harness/bench-pasta.py --iterations 1 ... passes - tools/perf-harness/bench-qemu-slirp.sh --backend libslirp passes

Eight follow-up fixes from PR #81 review: src/vmm/mod.rs: Extract `setup_tx_notify_ioeventfd` helper and gate the entire IOEVENTFD path on `epoll_arc.is_some()`. Fixes the original safety concern: the previous code registered KVM_IOEVENTFD even when no epoll dispatcher was available, which would have left guest TX notifies trapped in-kernel with no userspace drain — a silent hang. The helper rolls back the epoll registration if KVM_IOEVENTFD registration fails, so the two halves succeed or fail together. examples/crr_singleproc_bench.rs: Switch the host-side accept thread to non-blocking accept with a deadline check so the example never hangs forever if the guest fails to connect. Initial Copilot suggestion of a 2 ms sleep inflated each guest CRR sample by ~1.8 ms (sleep latency directly added to per-iter accept-pickup time). Reduced to 50 µs to keep the sample noise below the metric resolution. tools/perf-harness/bench-pasta.py: - `detect_host_gateway` now parses the route line by `via` keyword instead of indexing parts[2], so non-standard route formats don't silently pick up the wrong field. - CRR timer started before `srv.accept()` to match the voidbox-network-bench `crr_echo_server` semantics. tools/perf-harness/bench-qemu-slirp.sh: - Replace `time.sleep(60)` with `threading.Event().wait()` so the host echo server stays alive for the entire qemu run instead of timing out at 60 s. - Add fail-fast bind error handling so port collisions surface immediately instead of producing a confusing "no result" later. tools/perf-harness/qemu-init.sh: Derive the netmask from the CIDR prefix instead of hardcoding 255.255.255.0, so non-/24 networks work. tools/perf-harness/bench-compare-pasta.py: Remove unused `sign` variable. docs/passt-comparison.md: Update path reference from `scripts/` to `tools/perf-harness/`. Verified: voidbox single-process CRR p50 stays at ~280-310 µs (within noise of pre-fix baseline) and `cargo test --test network_baseline` passes 24/24.

Replace `std::mem::take(&mut *queue)` with an in-place `extend_from_slice` + `clear()` against a scratch Vec owned by `SlirpBackend`. The previous pattern moved the queue's allocation out and left a fresh `Vec::new()` (cap=0) behind, forcing the next `push_ready_events` to grow `extend_from_slice` from cap=0 every cycle. Heaptrack on the single-process CRR bench (30 iters) measured this single callsite as ~half of all allocations during the run: before: push_ready_events 4843 allocs (49% of total) drain_to_guest 4776 allocs (48% of total) total 12618 allocs after: push_ready_events gone from top callers drain_to_guest 3957 allocs (still hot, downstream) total 6885 allocs (-45%) p50 CRR latency is unchanged (~270 µs); the wall-clock floor is elsewhere on this workload. The win is reduced allocator churn (GC pressure, jitter on bulk paths, fewer slow-path mallocs under sustained load) — visible in the throughput bench rather than CRR microbench. The `pending_events` Mutex<Vec> is also pre-sized to `EVENTS_PRESIZE = 128` at construction so the very first push doesn't reallocate.

The SLIRP backend's per-second new-connection rate limit (`max_connections_per_second`, default 50/s) and concurrent- connection ceiling (`max_concurrent_connections`, default 64) are production anti-DoS defaults baked into `LocalSandbox`. They are hostile to microbenches that intentionally open hundreds of connections in a tight loop — at 51 connects/s the limiter starts returning RST to the guest, which crr-client sees as `ECONNREFUSED` on its very next connect and exits with rc=3. Reproduced as the "100-iter failure" in `crr_singleproc_bench`: 30 iters worked, 60 iters did not; the threshold was the 50/s limit, not anything in the network stack itself. Surface the two ceilings on `Sandbox::local()` as builder methods: .network_max_connections_per_second(u32::MAX) .network_max_concurrent_connections(usize::MAX) `None` keeps the production defaults, so this is purely additive. The bench now uses both. 500-iter run reproduces clean (p50 268 µs, p99 1.6 ms, host accepts 500/500).

Both `flush_pending_rx` and `try_inject_rx` previously built a fresh `Vec<Vec<u8>>` on every MMIO exit and handed it to `write_frames_to_rx_ring`, which consumed it by value. The pattern dropped the outer-Vec allocation and forced the next call to grow it from cap=0 — heaptrack on the CRR microbench measured the flush_pending_rx site at 173 calls / 108 MB peak, the largest remaining alloc consumer after the SLIRP `ready_scratch` fix. `write_frames_to_rx_ring` now takes `&mut Vec<Vec<u8>>` and drains in place via `drain(..)` / `append`, so callers reuse a long-lived scratch buffer: - `flush_pending_rx` uses a new `flush_scratch` field on `VirtioNetDevice`, populated from `pending_rx` (SegQueue) and cleared at end. - `try_inject_rx` reuses the existing `rx_scratch` field that was already paired with `get_rx_frames`; the trailing `mem::take` in `get_rx_frames` is now followed by a `clear()` + restore at the end of `try_inject_rx`, so the capacity persists across the round-trip. Heaptrack on 100-iter CRR: before this commit: 6885 allocs / 30 iters = 229/iter after this commit: 18926 allocs / 100 iters = 189/iter Aggregate from the original baseline: baseline (before all fixes): ~421 allocs/iter this commit: ~189 allocs/iter (-55%) p50 latency unchanged at ~275 µs as expected — alloc reduction shows up in throughput and tail-latency stability, not the CRR floor.

`relay_tcp_nat_data` builds a temporary `Vec<Vec<u8>>` per call because the relay can't push directly to `inject_to_guest` while iterating `flow_table` (both are `&mut self`). The previous pattern allocated a fresh `Vec::new()` every cycle, which heaptrack flagged as the biggest remaining contributor inside `drain_to_guest`'s call tree after the prior `ready_scratch` and `flush_scratch` fixes. Move the buffer onto `SlirpBackend` as `relay_frames_scratch` and use the standard `mem::take` → process → restore pattern so the buffer's capacity persists across `drain_to_guest` calls. The two trailing `inject_to_guest.append(&mut frames_to_inject)` sites already preserve capacity (Vec::append leaves the source empty but with its allocation intact); only the entry-point `Vec::new()` was discarding work. Cumulative impact on the 100-iter CRR microbench: baseline (before any of these fixes): ~421 allocs/iter after ready_scratch + flush_scratch: ~189 allocs/iter after relay_frames_scratch (this PR): ~93 allocs/iter (-78%) p50 latency continues at ~275 µs; the floor is dominated by KVM-exit / wakeup costs, not allocator churn. The win shows up under sustained load where reduced allocator pressure improves tail-latency stability and per-frame jitter.

Three of the relay functions called from `drain_to_guest` (`relay_tcp_nat_data`, `relay_icmp_echo`, `relay_udp_flows`) each built a per-call `Vec<FlowKey>` to side-step the `&mut self` / `flow_table` borrow conflict. The Vecs were allocated, populated, drained, and dropped on every cycle. The UDP relay built two — one for the stale-sweep, one for the readiness loop. Add a single `flow_keys_scratch: Vec<FlowKey>` field on `SlirpBackend` and rotate it through all four sites with the mem::take → process → restore pattern (the relays run sequentially inside `drain_to_guest`, so one buffer suffices). Each iteration uses `Vec::drain(..)` instead of for-by-value so capacity is preserved across the consume. Heaptrack on the 100-iter CRR microbench: before this commit: 9296 allocs (~93/iter) after this commit: 4103 allocs (~41/iter) temporary allocs: 5546 → 574 (-90%) Cumulative from the original baseline (start of this round): ~421 allocs/iter → ~41 allocs/iter (-90%) p50 latency unchanged at ~275 µs as predicted; the wall-clock floor is dominated by KVM exits / vCPU wakeups. The gain shows up as reduced allocator pressure on bulk paths and fewer slow-path mallocs under sustained load. Top remaining alloc callsites are now per-frame `Vec<u8>` from `build_tcp_packet_static` (one allocation per TCP frame) and TX queue frame parsing — both intrinsic to the protocol shape; further reduction needs a pool/arena, not a scratch hoist.

Same fix as `crr_singleproc_bench`: the bench's CRR phase opens 30 connections in <1s, which trips the production SLIRP rate limiter (50 conn/s) and surfaces as a 2 s "crr echo channel receive error" instead of a real number. Use the new `Sandbox::local()` rate-limit knobs to lift both ceilings (max_connections_per_second + max_concurrent_connections) explicitly. Production sandboxes are unaffected — the lift is opt-in.

Plan doc for the next perf round. After #81's user-space alloc reductions exhausted (-90% allocs/iter, p50 unchanged), the remaining floor is kernel↔userspace transitions, MMIO exits, and single-queue serialization. Three experiments in scope, ranked by risk × payoff: 1. io_uring for SLIRP host-socket I/O — start here 2. splice() / sendfile() zero-copy on bulk paths 3. MSI-X virtio + multi-queue for vCPU scaling Non-goal: TAP + passt-style host bypass. Routing through an external passt would close the latency gap to passt but moves the DNS interception, port-forwarding, deny-list, and rate-limiting feature surface out of voidbox — and loses the in-process observability we currently get from instrumenting SLIRP directly. Full SLIRP-path observability is a hard requirement. Each experiment lands as its own commit, gated behind a Cargo feature so the #81 baseline can A/B against it without a revert. Measurements use the harness shipped in #81.

First commit on the architectural-experiments branch (#83). Adds a `UringBatch` wrapper around `io_uring::IoUring` with the submit / drain shape the SLIRP relay will use to batch host-socket recv / send into single `io_uring_enter` round-trips. Key shape: - One `UringBatch` is single-owner: the SLIRP `net_poll_thread` constructs and drives one. No locking, no cross-thread sharing. - SQEs are tagged with `(UringOp, correlation_id)` packed into `user_data` so the completion drain routes a CQE back to its originating flow without a side table. Low 32 bits = correlation id, top 32 bits = op tag. - `submit_recv` / `submit_send` are `unsafe` because the kernel references the user buffer asynchronously; the caller's safety contract requires `buf` to outlive the matching CQE. - The existing `EpollDispatch` keeps owning the readiness signal — io_uring replaces only the data-plane syscalls, not the wake-up. Two layers stay separable so the feature can be toggled off without touching the relay state machine. Behavior unchanged: nothing wires this in yet. Cargo feature `io-uring` (off by default) gates both the new module and the `io-uring = "0.7"` dependency. Module is `#![allow(dead_code)]` for now; the next commit on this branch wires the relay TCP recv / send paths through it and removes the allow. Tests: - 4 unit tests in `src/network/uring.rs` cover user-data round trip + a real `submit_send` -> `submit_recv` cycle across a `socketpair` (skipped on kernels without io_uring). - `cargo test --features io-uring --lib`: 381 passed. - `cargo test --test network_baseline` (default features): 24/24. - `cargo clippy --all-targets [-- -D warnings]` clean both with and without the feature. Methodology per `docs/perf-architectural-experiments.md`: each experiment lands as one feature-gated commit so the #81 baseline can A/B against it without a revert. This is the infrastructure commit; the next one wires + measures.

Companion to `crr_singleproc_bench`: drives M concurrent crr-client processes in the same guest so the SLIRP relay sees N>1 ready flows per `net_poll_thread` cycle. The single-flow microbench can't see io_uring batching or multi-queue wins because there's nothing to batch / parallelize with one ready flow at a time; this bench is the workload the architectural experiments on this branch (#83) need. Per-flow `crr-client` writes its summary line to its own `/tmp/crr_results/$i.txt`; the trailing shell loop concatenates all M lines for the host to parse. Aggregation reports median-of-p50s, max p99, mean-of-means, and aggregate qps. Note: busybox-static lacks `seq`, so the flow-id list is materialized on the host and inlined into the shell command. ## Baseline (this branch's tip = #81 + io_uring scaffold) Single net_poll_thread, no architectural changes wired: | M | Median p50 | Max p99 | Aggregate qps | |---|-----------:|--------:|--------------:| | 1 | 275 µs | ~2 ms | ~3636 | | 2 | 473 µs | 12.9 ms | 2173 | | 4 | 732 µs | 13.2 ms | 2370 | | 8 | 2043 µs | 14.5 ms | 2242 | Reading: - Aggregate qps saturates at ~2200-2400 regardless of M — the single net_poll_thread is the bottleneck. - Per-flow p50 grows ~linearly with M (M=8 each flow takes 7.4× the M=1 p50). - p99 jumps to 12-14 ms at M=2 already; tail-latency is dominated by per-flow head-of-line blocking through the single epoll loop. This is exactly the workload io_uring batching, splice, and multi-queue should move. The io_uring wiring lands in the next commit on this branch with measurements against this table.

dpsoft mentioned this pull request May 6, 2026

feat(slirp): slirp_crr tracing target for CRR-path investigation #82

Closed

5 tasks

dpsoft requested a review from Copilot May 6, 2026 21:08

Copilot started reviewing on behalf of dpsoft May 6, 2026 21:09 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

dpsoft added 10 commits May 6, 2026 18:30

dpsoft force-pushed the passt-comparison-harness branch from 9394dd6 to 3c5da08 Compare May 6, 2026 22:12

dpsoft added 6 commits May 6, 2026 19:39

dpsoft mentioned this pull request May 7, 2026

perf(slirp): architectural experiments — io_uring / splice / multi-queue #83

Draft

5 tasks

dpsoft merged commit 5017b26 into main May 7, 2026
22 checks passed

dpsoft deleted the passt-comparison-harness branch May 7, 2026 00:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tools: passt/pasta head-to-head comparison harness#81

tools: passt/pasta head-to-head comparison harness#81
dpsoft merged 16 commits intomainfrom
passt-comparison-harness

dpsoft commented May 6, 2026 •

edited

Loading

Uh oh!

dpsoft commented May 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dpsoft commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Final results

What's new

Harness (the original PR scope)

Perf round 1 — wall-clock CRR optimizations

Perf round 2 — heaptrack-driven allocation hoisting

Bench infrastructure fixes

How pasta replaces qemu+passt

Usage

Test plan

Follow-ups (not in this PR)

Uh oh!

dpsoft commented May 6, 2026

Perf chase summary — voidbox SLIRP optimisation series

Numbers

TCP CRR (apples-to-apples per the spec)

TCP throughput (the real win)

Latency primitives unchanged at parity

CRR via the voidbox-network-bench harness (with nc per iteration) is unchanged

What got optimised

Highlights of each perf change

Final profile under sustained bulk throughput

Stopping point

Status

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dpsoft commented May 6, 2026 •

edited

Loading

CRR via the voidbox-network-bench harness (with `nc` per iteration) is unchanged